Reconciling Provenance Policy Conflicts by Inventing Anonymous Nodes

نویسندگان

  • Saumen C. Dey
  • Daniel Zinn
  • Bertram Ludäscher
چکیده

In scientific collaborations, provenance is increasingly used to understand, debug, and explain the processing history of data, and to determine the validity and quality of data products. While provenance is easily recorded by scientific workflow systems, it can be infeasible or undesirable to publish provenance details for all data products of a workflow run. We have developed PROPUB, a system that allows users to publish a customized version of their data provenance, based on a set of publication and customization requests, while observing certain provenance publication policies, expressed as logic integrity constraints. When user requests conflict with provenance policies, repair actions become necessary. In prior work, we removed additional parts of the provenance graph (i.e., not directly requested by the user) to repair constraint violations. In this paper, we present an alternative approach, which ensures that all relevant nodes are retained in the provenance graph. The key idea is to introduce new anonymous nodes to represent lineage dependencies, without revealing information that the user wants to protect. With this new approach, a user may now explore different provenance publication strategies, and choose the most appropriate one before publishing sensitive provenance data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Repairing Provenance Policy Violations by Inventing Non-Functional Nodes

In scientific collaborations, provenance is increasingly used to explain, debug, reproduce, and determine the validity and quality of data products. In such environments, it can be infeasible or undesirable to publish the complete provenance of all the final output data products. We have developed PROPUB, a system that allows users to publish a customized version of their data provenance, based...

متن کامل

From Web Sources

This paper describes a workflow of simplifying and matching special language terms in RDF generated from trawling term candidates from Web terminology sites with TermFactory, a Semantic Web framework for professional terminology. Term candidates from such sources need to be matched and eventually merged with resources already in TermFactory. While merging anonymous data, it is important not to ...

متن کامل

Secure Provenance for Data Preservation Repositories

Importance of research data preservation and management has been accepted by the scientists all around the world. Interest and investment in data preservation projects has become higher than ever before. Already there are number of wellknown research data repositories for different types of research data. Data preservation, sharing, discovery and reuse are the key features which are common acro...

متن کامل

What if Multiusers Wish to Reconcile Their Data?

Reconciliation is the process of providing a consistent view of the data imported from different sources. Despite some efforts reported in the literature for providing data reconciliation solutions with asynchronous collaboration, the challenge of reconciling data when multiple users work asynchronously over local copies of the same imported data has received less attention. In this paper, we p...

متن کامل

Provenance-Only Integration

As provenance records are collected from an increasingly diverse set of sources, the need to integrate them grows. The alternative approach of reconciling semantics scales when the records are queried infrequently. However, as the use of provenance grows, normalizing the diverse provenance via formal integration will yield better query performance. We describe two motivating cases for integrati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011